Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repository_ctx: Allow renaming archive entries during extraction. #16052

Closed

Conversation

jmillikin
Copy link
Contributor

@jmillikin jmillikin commented Aug 5, 2022

Adds a rename_files= parameter to the extract() and download_and_extract() methods of repository_ctx. This new parameter takes a dict of archive entries to rename, and is applied prior to any prefix adjustment.

@jmillikin
Copy link
Contributor Author

jmillikin commented Aug 5, 2022

Use cases include:

  • Archives where multiple files have the same name in different case. On platforms with case-insensitive filesystems (macOS, Windows) these archives can't be extracted as-is, and the files can't be fixed up after extraction.
    • One such project is the Linux kernel, which has files named like net/netfilter/xt_dscp.c and net/netfilter/xt_DSCP.c (among others).
  • Archives containing non-UTF8 filenames fail to extract on platforms or filesystems that enforce Unicode paths.

A longer example of extracting the Linux kernel on macOS (which is case-insensitive by default):

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

def _linux_repository(ctx):
    ctx.file("WORKSPACE", "workspace(name = {name})\n".format(
        name = repr(ctx.name),
    ))
    ctx.file("BUILD.bazel", """
genrule(
    name = "nf_srcs",
    srcs = glob(["net/netfilter/*"]),
    outs = ["nf_srcs.txt"],
    cmd = "ls -1 external/linux/net/netfilter/ > $@",
)
""")

    ctx.download_and_extract(
        url = ["https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.19.tar.xz"],
        sha256 = "ff240c579b9ee1affc318917de07394fc1c3bb49dac25ec1287370c2e15005a8",
        stripPrefix = "linux-5.19",
        rename_files = _rename_files("5.19"),
    )

def _rename_files(version):
    netfilter_srcs = {
        "xt_DSCP.c": "xt_DSCP_target.c",
        "xt_HL.c": "xt_HL_target.c",
        "xt_RATEEST.c": "xt_RATEEST_target.c",
        "xt_TCPMSS.c": "xt_TCPMSS_target.c",
    }
    renames = {}
    for orig_name, new_name in netfilter_srcs.items():
        prefix = "linux-{}/net/netfilter/".format(version)
        renames[prefix + orig_name] = prefix + new_name
    # and similar logic for netfilter headers under include/uapi/linux/
    return renames

linux_repository = repository_rule(
    implementation = _linux_repository,
)
$ tar -tf ~/downloads/linux-5.19.tar.xz | grep -i xt_dscp.c
linux-5.19/net/netfilter/xt_DSCP.c
linux-5.19/net/netfilter/xt_dscp.c
$ bazel build @linux//:nf_srcs
[...]
Target @linux//:nf_srcs up-to-date:
  bazel-bin/external/linux/nf_srcs.txt
$ cat bazel-bin/external/linux/nf_srcs.txt | grep -i xt_dscp
xt_DSCP_target.c
xt_dscp.c

@jmillikin jmillikin force-pushed the ctx-extract-rename-outputs branch from 2856e8d to 429558c Compare August 5, 2022 12:47
@@ -749,8 +749,22 @@ public StructImpl download(
+ " archive. Instead of needing to specify this prefix over and over in the"
+ " <code>build_file</code>, this field can be used to strip it from extracted"
+ " files."),
@Param(
name = "renameFiles",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time I actually noticed that stripPrefix has a camel case name - that looks like a bug as Starlark parameters usually use snake case. Using snake case for renameFiles at least wouldn't complicate future efforts to clean up this inconsistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sgowroji sgowroji added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Aug 5, 2022
@jmillikin
Copy link
Contributor Author

Gentle ping

Copy link
Member

@Wyverald Wyverald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

private final Decompressor decompressor;

private DecompressorDescriptor(
String targetKind, String targetName, Path archivePath, Path repositoryPath,
@Nullable String prefix, boolean executable, Decompressor decompressor) {
@Nullable String prefix, boolean executable,
@Nullable Map<String, String> renameFiles,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be nullable (it's never null anyway, and null has the same semantics as empty). also remove all the null checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -164,6 +175,12 @@ public Builder setExecutable(boolean executable) {
return this;
}

@CanIgnoreReturnValue
public Builder setRenameFiles(Map<String, String> renameFiles) {
this.renameFiles = renameFiles;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defensively copy here (ImmutableMap.copyOf)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

doc =
"An optional dict specifying files to rename during the extraction. Archive entries"
+ " with names exactly matching a key will be renamed to the value, prior to"
+ " any directory prefix adjustment."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add more information on how this parameter can be used, like you did in the PR comments? or maybe even link to this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

doc =
"An optional dict specifying files to rename during the extraction. Archive entries"
+ " with names exactly matching a key will be renamed to the value, prior to"
+ " any directory prefix adjustment."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* extraction process.
*/
@Test
public void testDecompressWithRenamedFiles() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test case where strip_prefix is also present, to test that rename_files happens before strip_prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@Wyverald Wyverald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@Wyverald Wyverald added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Aug 23, 2022
@sgowroji sgowroji removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Aug 24, 2022
aiuto pushed a commit to aiuto/bazel that referenced this pull request Oct 12, 2022
Adds a `rename_files=` parameter to the `extract()` and `download_and_extract()` methods of `repository_ctx`. This new parameter takes a dict of archive entries to rename, and is applied prior to any prefix adjustment.

Closes bazelbuild#16052.

PiperOrigin-RevId: 469686270
Change-Id: Ia8bc37ba1150ee1ece8ef4f613c26d0f93ca988b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants